Human Genetics and Genomics Advances — Latest Matching Preprints

1

Meta-analysis of over 8,000 individuals from Hawai'i and Samoa for genetic associations to cardiometabolic phenotypes

Dinh, B. L.; Wang, X.; Sheng, X.; Wan, P.; Srivastava, A. K.; Naseri, T.; Viali, S.; Wilkens, L.; Le Marchand, L.; Haiman, C. A.; Weeks, D.; Chiang, C. W. K.; Carlson, J. C.

2026-05-12 genetic and genomic medicine 10.64898/2026.05.08.26352761 medRxiv

Top 0.1%

6.5%

Show abstract

Although genome-wide association studies (GWAS) now routinely reveal genetic associations and biological insights in millions of individuals, underrepresentation of global populations, such as those from Polynesia, continue to persist. These exclusions, often driven by logistical challenges and lack of data, prevent systematic identification of population-enriched associations, such as the association of the missense variant at the CREBRF locus to BMI and type 2 diabetes discovered commonly occurring in Polynesian populations due to its rarity in global populations. Armed with the recently updated TOPMed imputation panel that could benefit studies in diverse populations that previously had poorer imputation performance, we performed the first GWAS of Native Hawaiians and largest to date of Polynesian-ancestry populations (combined N up to 8,461) to identify population-enriched associations for 13 adiposity and cardiometabolic traits available across both cohorts: BMI, fasting glucose, fasting insulin, HDL, height, hip circumference, HOMA-IR, LDL, T2D, total cholesterol, triglycerides, waist circumference, and waist-hip ratio. We found 25 trait-loci associations that met genome-wide significance: 20 previously reported or known associations and 5 associations newly confirmed via meta-analysis. In particular, with improved statistical power, we were able to confirm the suspected association between the missense CREBRF variant with fasting glucose levels. The remaining 4 potentially novel loci-trait associations for BMI, LDL, and waist-hip ratio, however, were not replicated in multi-ethnic datasets from All-of-Us despite having reasonable power to replicate. The lack of Polynesian-enriched findings outside of the CREBRF locus informs the bounds of the effect sizes or frequency of any enriched variants, and suggests that further expansion of cohort sizes from this region of the world and improved imputation references specific to these populations are needed to identify more population-enriched associations.

2

Integrated luminescence and phenotypic profiling for drug discovery in a zebrafish model of Marfan syndrome

Horvat, M.; Caboor, L.; De Rycke, K.; Mennens, L.; Daniels, E.; Wyseur, J.; Verhelst, E.; Roos, I.; Rodriguez-Rovira, I.; Egea, G.; De Backer, J.; Sips, P.

2026-05-13 pharmacology and toxicology 10.64898/2026.05.12.722859 medRxiv

Top 0.1%

4.3%

Show abstract

BackgroundMarfan syndrome (MFS) is a life-threatening heritable connective tissue disorder caused by pathogenic variants in fibrillin-1, characterized by progressive cardiovascular disease. Current medical therapies slow disease progression but do not prevent major complications, underscoring the need for new treatment strategies and unbiased discovery approaches. MethodsWe used a zebrafish model of MFS lacking fibrillin-3 (fbn3-/-), which recapitulates key cardiovascular phenotypes including cardiac stress, valvular defects, arrhythmia, and aortic dilation. To enable sensitive, quantitative assessment of cardiac stress, we generated a novel transgenic zebrafish reporter expressing secreted nanoluciferase under control of the stress-responsive nppb promoter. This reporter was combined with morphological phenotyping and bulbus arteriosus (BA) imaging. We evaluated standard MFS therapies, targeted modulators of TGF-{beta} signaling, and performed an unbiased high-throughput drug screen of over 1 500 clinically approved compounds across multiple developmental treatment windows. Resultsfbn3-/- larvae exhibited markedly elevated nppb activity that correlated with phenotypic severity and peaked during stages of highest mortality. The nanoluciferase reporter provided a [~]1 000-fold dynamic range, substantially outperforming Firefly luciferase-based assays. Pharmacological inhibition of TGF-{beta} signaling produced transient or deleterious effects, while {beta}-blockers, losartan, and allopurinol failed to consistently improve cardiac stress, pericardial edema, or BA dilation. The unbiased high-throughput drug screen identified a small number of primary and secondary hits; however, none demonstrated reproducible phenotypic rescue upon rigorous multi-dose, multi-time window validation. ConclusionsThis study establishes a sensitive zebrafish-based platform for early, quantitative assessment of cardiovascular stress in MFS. Our findings highlight the limited efficacy of current therapies, the context-dependent nature of TGF-{beta} modulation, and the biological complexity underlying MFS pathogenesis. Although no definitive therapeutic candidates were identified, this work lays a robust foundation for expanded unbiased discovery efforts aimed at identifying disease-modifying interventions for MFS.

3

Prioritizing embryos with lower homozygosity may reduce disease risk in children of related individuals undergoing preimplantation genetic testing

Wolfram, T.; Ahangari, M.; Davidson, I.; Wartschinski, L.; Li, J. H.; Eyre, M.; Stern, D.; Schleede, J.; Haghighi, A.; Carmi, S.; Christensen, M.

2026-06-04 genetic and genomic medicine 10.64898/2026.05.30.26354526 medRxiv

Top 0.1%

4.0%

Show abstract

Consanguinity is a reproductive union between individuals who share a recent common ancestor. These unions are common in many regions of the world and increase the burden of rare recessive disorders by elevating autozygosity in offspring. Current reproductive genetic screening focuses on a limited set of known pathogenic variants, leaving most recessive risk unaddressed. Here we argue that embryo-level autozygosity, quantified as the fraction of the genome in long runs of homozygosity (FROH), is a potentially actionable genomic biomarker that can be integrated into routine preimplantation genetic testing as a homozygosity-informed embryo-prioritization framework (PGT-H) that can be layered onto existing embryo biopsy workflows when couples are already undergoing IVF with PGT-A or PGT-M. Using forward simulations of first-cousin and double-first-cousin couples, we show that siblings conceived by the same couple span a wide range of FROH; selecting the lowest-FROH candidate from a cohort of five embryos reduces FROH by approximately 40% on average. Combining these reductions with empirical effect-size estimates, we estimate that for first-cousin couples this strategy could reduce risk of intellectual disability by roughly 35-45% (corresponding to an absolute risk reduction of about 1.8-2.2%) and potentially reduce excess recessive disease burden, while also modestly reducing risk of common diseases such as type 2 diabetes. We outline how existing PGT-A and PGT-M workflows could potentially be extended to report embryo-level FROH and discuss ethical and counseling considerations. Autozygosity-based embryo prioritization offers a principled way to address a component of recessive risk that current variant-centric approaches miss.

4

Comprehensive analysis of de novo variants across 2,497 orofacial cleft trios reveals novel genetic drivers of disease

Kurtas, N. E.; Sanchis-Juan, A.; Shin, E.; Curtis, S. W.; Robinson, K. R.; Lee, A. S.; Alade, A. A.; Zhao, X.; Fu, J.; Diaz Perez, K. K.; Gowans, J. J. L.; Eshete, M. A.; Adeyemo, W. L.; Buxo, C. J.; Padilla, C. D.; Poletta, F. A.; Carreno Torres, A.; Wehby, G. L.; Hecht, J. T.; Moreno Uribe, L. M.; Mukhopadhyay, N.; Shaffer, J. R.; Weinberg, S. M.; Murray, J. C.; Beaty, T. H.; Butali, A.; Talkowski, M.; Marazita, M. L.; Leslie-Clarkson, E. J.; Brand, H.

2026-05-24 genetic and genomic medicine 10.64898/2026.05.21.26352934 medRxiv

Top 0.1%

3.6%

Show abstract

Background Orofacial clefts (OFCs) and other palate abnormalities (PAs) are among the most common birth defects worldwide and are characterized by the abnormal formation of the lip and/or palate. Genetic studies have traditionally classified OFC cases as either syndromic, involving OFCs alongside other congenital anomalies, or nonsyndromic, which represent the majority of cases and occur in isolation. Emerging genomic evidence indicates that genes traditionally associated with syndromic forms of OFC can also harbor variants contributing to isolated cases, challenging the notion of a strict dichotomy between these categories and supporting their integration for gene discovery. Methods In this study, we applied multiple analytic approaches to characterize the genetic architecture of OFC and PAs by integrating genomic data from 2,497 trios with an OFC (n=2080) and PA (n=417) affected proband. We compared these findings across OFC subtypes and syndromic status with those from 5,515 control trios to identify enriched biological pathways and mechanisms and to prioritize candidate genes using variant burden testing. Results We observed a significant enrichment of de novo protein-truncating and damaging missense variants in cases compared to controls (OR = 2.17, p = 1.21x10-32), with particularly strong signals in biologically relevant gene sets involving OFC-associated, constrained, Mendelian disorder, and mouse candidate genes. Variant burden testing identified 39 OFC risk genes at FDR [≤] 0.05, which we then integrated with 593 established OFC genes to interrogate the functional underpinnings of OFC via network analysis. This analysis revealed 309 high-order interactor genes not previously associated with OFC. Notably, this OFC network clustered into ten distinct biological pathways, with nucleosome-associated genes showing significant enrichment among cases in our cohort (OR = 14.8, p = 8.1x10-4). In a final integrative step, we combined evidence across all analyses to nominate 231 candidate genes, 32 of which contained at least two deleterious de novo variants in our cohort. Conclusions These findings underscore the value of integrating diverse OFC and PA subtypes, syndromic status, and variant classes to refine the genetic architecture of these disorders, highlighting both phenotypic expansion of known disease genes and the emergence of novel gene-phenotype associations.

5

Contextualizing the Utility of Polygenic Risk Scores using Absolute Risk Models in Diverse Ancestry Populations

Chatterjee, N.; Martina, F.; Kachuri, L.; Natarajan, P.; Witte, J.; Huo, D.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354842 medRxiv

Top 0.1%

3.5%

Show abstract

Polygenic risk scores (PRSs) are emerging as powerful tools for quantifying inherited risk for common diseases and, in some cases, are approaching clinical implementation. A major concern for PRS implementation is their limited accuracy in non-European populations, particularly in those of African ancestry. However, past evaluations have focused on metrics such as relative risk or AUC, which do not capture background risk arising from contextual factors. We introduce a novel measure of variable importance, the conditional average derivative estimator (CADE), to evaluate PRS utility across diverse contexts and populations within absolute risk models that integrate PRSs with other relevant risk factors. We illustrate this framework by integrating PRSs for breast and prostate cancer within age-specific absolute risk models for incidence and mortality fit using individual-level data from the All of Us Research Program with inputs from the National Cancer Institute SEER cancer registry. Our projections show that although the PRSs are known to have the lowest discriminatory accuracy in African Americans (AA), there are contexts in which they provide greater utility, such as for the stratification of prostate cancer risk and mortality, where the CADE values for AA were 2- and 7-fold higher than for European Americans. These findings suggest that conclusions about the limited clinical utility of PRS in non-European populations may be premature and underscore the need to quantify PRS risk-stratification utility at the absolute-risk level, while accounting for disease onset, survival, and broader health and economic factors.

6

Benchmarking of local ancestry inference with different assays and parameters

Motegi, T.; Huang, F.; Campbell, J. D.

2026-05-21 genomics 10.64898/2026.05.18.726085 medRxiv

Top 0.1%

3.1%

Show abstract

Local ancestry inference (LAI) enables high-resolution characterization of chromosomal segments inherited from distinct ancestral populations, offering unique insights into genetic architecture in admixed cohorts. While LAI is commonly performed with high-coverage whole-genome sequencing (WGS), the ability of other genotyping assays or varying sequencing depths has not been thoroughly benchmarked. In this study, we systematically evaluated the accuracy of LAI across SNP microarrays, whole-exome sequencing (WES), and ultra low-pass WGS (ULP-WGS) using diverse validation samples and state-of-the-art imputation pipelines. We show that ULP-WGS, when paired with GLIMPSE2, achieves robust accuracy at 0.25x coverage with a minimum genome window size of 0.5 centimorgans, with mean accuracy minus one standard deviation exceeding 95%. For WES, using "on-target" reads alone yields suboptimal performance, particularly for European and South Asian ancestries with accuracy less than 79.1% and 70.6%, respectively. However, incorporating "off-target" reads in WES and utilizing GLIMPSE2 substantially improved accuracy [≥]95% with a minimum window size of 0.2 centimorgans. We further evaluated formalin-fixed, paraffin-embedded (FFPE) samples and found that LAI could be performed successfully using WES data with accuracies of [≥]95% at a minimum window size of 0.5 centimorgans. In contrast, SNP microarrays did not achieve substantial accuracies at any window size ([≤]95%). Together, these results demonstrate that LAI is achievable without conventional high-coverage WGS and establish optimal parameters for LAI across platforms.

7

Estimating uncertainty in family-based GWAS

Miao, X.; Edge, M. D.; Harpak, A.

2026-05-14 genetics 10.64898/2026.05.11.724392 medRxiv

Top 0.2%

2.3%

Show abstract

Standard genome-wide association studies (GWASs) are vulnerable to confounding factors, including stratification, assortative mating, and dynastic effects. Family studies such as sibling-based GWAS (sib-GWAS) mitigate such confounding and are becoming the tool of choice for teasing apart direct genetic effects--causal effects of ones genotype on ones own phenotype-- from other factors. However, due in part to their smaller sample sizes, sib-GWAS allelic effect estimates are substantially more variable than standard (i.e., population-based) GWAS estimates. The quantification of this uncertainty is essential for many uses of sib-GWAS, including polygenic scoring, causal inference (e.g., Mendelian randomization), disentangling direct from indirect familial effects, and measuring assortative mating. Here, we investigate sources of uncertainty in sib-GWAS allelic effect estimators. We study their impacts on the biases of three uncertainty measurement methods, including two that are commonly used and a new resampling-based approach we propose. We find that heterogeneity in allelic effects or heteroskedasticity across families (e.g., due to variation in genetic backgrounds or environments) can bias existing methods, and that this bias is more severe for small samples and rare variants. In contrast, the resampling-based approach we propose is approximately unbiased under all scenarios we considered. We validate our theoretical predictions, as well as the importance of effect heterogeneity and heteroskedasticity, using simulations and empirical analysis in the UK Biobank. In sum, this study helps understand the sources of uncertainty in family-based genotype-phenotype association studies and provides a robust method to estimate uncertainty.

8

Epistatic SNP network analysis (ESNA): A scalable framework for genome-wide detection of higher-order genetic interactions

Zhang, Y.; Han, M.; Ambalavanan, A.; Topouza, D.; Fang, Z. Y.; Stickley, S. A.; Anand, S.; Turvey, S.; Mandhane, P. J.; Simons, E.; Moraes, T. J.; Subbarao, P.; Choi, J.; Duan, Q.

2026-05-13 genetic and genomic medicine 10.64898/2026.05.08.26352667 medRxiv

Top 0.2%

2.1%

Show abstract

Although genome-wide association studies (GWASs) have been widely applied to investigate the genetic basis of common traits and diseases in human populations, the associated loci do not fully account for the estimated heritability. The missing heritability may be explained, in part, by epistasis or gene-gene interactions. Existing methods for detecting epistasis, however, are limited to pair-wise interactions and/or targeted genomic regions. Here, we present a novel model, termed the Epistatic SNP Network Analysis (ESNA), which detects higher-order epistatic interactions using genome-wide SNP data. ESNA employs a scale-free network algorithm within a parallel computing framework that identifies modules of correlated SNPs, potentially interacting variants that converge on common biological pathways, while enhancing computational efficiency. We applied ESNA to investigate epistatic interactions contributing to respiratory outcomes such as recurrent wheeze and asthma among preschool-aged children in the CHILD Cohort Study. Using genome-wide data comprising 775,569 SNPs from 1,899 children, ESNA identified 914 SNP network modules, 9 of which were significantly associated with recurrent wheeze between ages 2 and 5 years (P<5.47x10-5). Furthermore, 7 of these wheeze-associated modules were also associated with asthma by age 5 years (P<5.47x10-5). Pathway enrichment analysis revealed that the associated modules consist of SNPs located in genes previously implicated in asthma and related biological processes, such as cellular response to stimuli and nervous system development. Compared to existing network-based methods for epistasis, ESNA demonstrated substantial improvements in computational efficiency, reducing memory usage by 50% and processing genome-wide SNP data 48 times faster. The code implementation and documentation are available at https://github.com/ComputationalGenomicsLaboratory/ESNA.

9

Detecting genomic regions enriched for reciprocal recombination in autism spectrum disorder

Mahoney, C. F.; Salter-Townshend, M.; Fitzpatrick, D. J.; Shields, D. C.

2026-05-27 genetics 10.64898/2026.05.26.727863 medRxiv

Top 0.2%

2.0%

Show abstract

Meiotic recombination is an important means of increasing genetic diversity by generating novel haplotypes in a population. Recombination separates linked loci extremely slowly in some regions, therefore genetic variants in high linkage disequilibrium may become co-adapted. Reciprocal recombination that separates co-adapted variants may generate a deleterious de novo haplotype that contributes to disease. We developed statistical methods to detect genomic regions of recombination excess in two different family-based study designs. We identified recombination in the Simons Simplex Collection in 273 simplex families with one child with autism spectrum disorder (ASD) and at least two unaffected children, in which recombinations can be mapped to the proband and contrasted with the recombination counts in unaffected siblings; and in 1,802 families with two children, where the number of recombinations identified can be contrasted with the expectation from a reference recombination map. Both strategies revealed a tail of low p-values for loci of interest that contrasted with the rest of the distribution. Permutation and bootstrap tests did not identify genome-wide primary findings in either cohort, but the most significant three-child cohort locus of recombination excess (between cadherin genes CDH4 and CDH26) replicated in the two-child cohort (p=0.01). While this replication strategy was not defined a priori, five of the most recombination enriched bins identified candidate ASD genes (p=0.02; WWOX, ADAMTS16, INSR, ADARB2, and HS6ST1). Since the six identified loci were not identified as regions of high de novo copy number variation in the study cohort and no CNVs were detected in any of the recombinant probands in the identified regions, they represent candidates for reciprocal recombinations generating unfavourable haplotypes for these genes. This study highlights a previously unidentified source of clinical genetic variability contributing to the molecular aetiology of ASD. AUTHOR SUMMARYAutism spectrum disorder (ASD) is a constellation of neurodevelopmental disabilities characterised by deficits in social communication and repetitive patterns of behaviour. While ASD is highly heritable, its genetic basis is complex and poorly understood. While some highly penetrant types of genetic variation have been identified, most people with ASD carry a large number of variants that each contribute a small amount to their overall phenotype. In addition to mutations in individual genes, changes in the configuration of genes along a chromosome may contribute to ASD. Here, we describe a method for identifying regions where such new configurations have occurred through recombination and attempt to find regions where such changes are more common in autistic children than in their non-autistic siblings. We explore recombination as a source of genetic variation contributing to autism, which has potential to inform clinicians in providing services to autistic people and their families.

10

Calibrated Prediction Intervals for Polygenic Scores: Updated Comparisons, Contextual Calibration, and Data Normalization

Chang, X.; Hou, S.; Zhou, X.

2026-05-19 genetic and genomic medicine 10.64898/2026.05.15.26353336 medRxiv

Top 0.2%

1.9%

Show abstract

Calibrated prediction intervals for polygenic scores (PGS) are essential for communicating individual-level uncertainty in genomic medicine. We present updated comparisons of two methods for constructing such intervals: CalPred, a parametric approach, and PredInterval, a non-parametric approach. Our results show that both methods can achieve calibrated coverage, although CalPred additionally requires a sufficiently large calibration set. The two methods also exhibit complementary trade-offs with respect to dataset size and risk identification. We further show that contextual calibration, as introduced in Hou et al. and followed in Shi et al., is most naturally achieved through appropriate phenotype normalization and data preprocessing. Apparent miscalibration can arise from inadequate normalization or from providing contextual information to some methods but not others. In UK Biobank, standard GWAS phenotype normalization procedures are sufficient to achieve contextual calibration for traits analyzed. In the extreme simulations of Hou et al. and Shi et al., supplying contextual covariates to PredInterval restores contextual calibration without normalization, and appropriate normalization can achieve contextual calibration without supplying covariates, while also substantially improving upstream tasks including association power and PGS accuracy. Together, these results underscore the central role of phenotype normalization and data preprocessing in GWAS analyses, including reliable uncertainty quantification for PGS.

11

When can whole-genome SNP heritability be reliably estimated from summary statistics?

Pham, B. K.; Davenport, S.; Azriel, D.; Schwartzman, A.

2026-05-16 genetics 10.64898/2026.05.13.724972 medRxiv

Top 0.2%

1.9%

Show abstract

LD Score Regression (LDSC) is a prominent method, which estimates whole-genome SNP heritability from summary statistics via the slope of a linear regression of GWAS test statistics corresponding to a trait of interest against LD scores. It was claimed by the LDSC authors that the free intercept in the regression accounts for confounding bias such as population stratification. In this study, we argue that the intercept in LDSC must be fixed to 1 for accurate SNP heritability estimation. We show both theoretically and with simulations that the estimated intercept does not accurately capture population stratification effects, and that it adversely affects the accuracy of the heritability estimate introducing bias and increasing variance. Fixing the intercept to 1 eliminates bias and reduces variance when no population stratification is present. On the other hand, under population stratification, LDSC is biased with both the free and the fixed intercept. Additionally, we show that estimated standard errors in LDSC are underestimated, potentially leading to false-positives in downstream GWAS analyses.

12

Genomic-Relatedness Matching Expands Population Coverage, Improves Power, and Reduces Bias in Genetic Association Analyses

Jaishankar, D.; Gjorgjieva, T.; Jala, J.; Swigert, J.; Young, A. S.; Benjamin, D. J.; Cesarini, D. A.; Turley, P.

2026-05-18 genetic and genomic medicine 10.64898/2026.05.14.26353140 medRxiv

Top 0.3%

1.7%

Show abstract

We introduce a novel approach, Genomic-Relatedness-Matched Association (GRMA) studies, as an alternative to genome-wide association studies (GWAS). GWAS are typically restricted to samples of mostly unrelated individuals with a single, shared continental ancestry and nevertheless can still be biased by gene-environment correlation and assortative mating. In contrast, GRMA can be implemented in ancestrally diverse samples--retaining individuals of mixed or underrepresented ancestries and eliminating the need to assign labels to ancestry groups--and can reduce bias relative to standard GWAS. GRMA matches each individual to a group of controls whose pairwise relatedness with the individual exceeds a user-specified threshold. It generates SNP-level summary statistics based on within-group associations. In applications using the UK Biobank and All of Us data, we find that GRMA compares favorably to GWAS methods in terms of bias, precision, and population coverage. GRMA enables several novel findings; for example, we find that "genetic nurture" is unlikely to be an important source of genome-wide bias in population GWAS of body mass index, height, and educational attainment. The method is computationally efficient and supported by open-source software, facilitating its application in large-scale scientific and health-related studies.

13

Functionally informed annotation influences pathway-specific polygenic risk and disease inference in Alzheimer's disease

Bazemore, K.; Iqbal, T.; Kuzma, A. B.; Grant, S. F. A.; Schellenberg, G. D.; Wang, L.-S.; Chesi, A.; Jin, J.; Naj, A. C.

2026-05-26 epidemiology 10.64898/2026.05.25.26353905 medRxiv

Top 0.3%

1.7%

Show abstract

Pathway-specific polygenic risk scores (pathway-PRS) measure aggregate genetic risk across single nucleotide variants (SNVs) annotated to genes in a pathway of interest. In most applications, SNV-to-gene annotation is based on SNV position with respect to gene boundaries. This approach is ill-suited for incorporating non-coding SNVs, which can regulate gene expression over long distances and represent a large proportion of risk variants for Alzheimer's disease (AD). Here, we compare the performance of AD pathway-PRS across SNV-to-gene annotation strategies that integrate varying levels of functional genomic data, including adult brain chromatin interaction and expression quantitative trait loci (eQTL) data. In the UK Biobank (n=328,526), including AD cases defined by ICD-9/10 codes (n=3,043) and by family history of AD/dementia (n=38,589), we show that the annotation strategy integrating chromatin interaction and eQTL data consistently improves pathway-PRS performance. We replicate this finding in independent data from the Alzheimer's Disease Genetics Consortium (n=3,370). We further find that pathway-PRS associations with AD vary by annotation strategy and that power to detect sex-dependent and age-at-onset associations is increased with integrative annotation. Together, these findings support the use of functionally informed SNV-to-gene annotation for pathway-PRS construction and highlight the importance of applying multiple annotation strategies for robust inference.

14

Integrated Transcriptomic and Functional Analysis Reveals Tissue-Specific Molecular Pathology in Adolescent Idiopathic Scoliosis

Ramkhalawan, D.; Parrales, P.; Koesterich, J.; Montoya-Vazquez, G.; Cuna, C.; Kreimer, A.; McQuerry, J.; Ihnow, S.; Makki, N.

2026-05-29 genetics 10.64898/2026.05.27.727643 medRxiv

Top 0.3%

1.6%

Show abstract

Adolescent idiopathic scoliosis (AIS), the spontaneous development of a lateral spine curvature during puberty, is the most common pediatric spine disorder, affecting [~]3% of children worldwide. As the underlying etiology remains unclear, AIS is treated purely symptomatically, initially by bracing and ultimately by highly invasive, costly surgeries. Genome-wide association studies (GWAS) have identified numerous risk loci in non-coding genomic regions, making it difficult to link them to a biological function. To address this, we performed a multi-tissue investigation to connect genetic risk to tissue-specific molecular pathology. We conducted RNA sequencing on the primary tissues implicated in AIS, paraspinal muscle and spinal cartilage, from patients and unaffected controls. In paraspinal muscle, we identified differentially expressed genes (DEGs) enriched for pathways related to muscle structure, myogenesis, and metabolism. Key upregulated genes include the transcription factor EGR1 and structural components such as MYH1. In spinal cartilage, we found enrichment of genes related to TGF{beta} and FoxO signaling, as well as metabolic pathways. Notably, genes crucial for chondrocyte differentiation (e.g. SOX5 and SOX6) were significantly downregulated. We then examined genes at known GWAS loci and found that several risk-associated genes were differentially expressed in one or both tissues. To investigate the function of non-coding variants at these loci, we identified and validated several enhancer elements harboring AIS risk SNPs at the BCL2, ADGRG6, BNC2, and FTO loci. We reveal distinct pathological signatures in muscle and cartilage and lay the foundation for connecting non-coding genetic risk to the dysregulation of key developmental and structural pathways.

15

HiFiMAP: High-resolution fast identity-by-descent mapping test

Guo, B.; Naseri, A.; Xie, Z.; Sarnowski, C.; Zhi, D.; Chen, H.

2026-05-17 genetic and genomic medicine 10.64898/2026.05.06.26352570 medRxiv

Top 0.3%

1.6%

Show abstract

Although traditional genome-wide association studies (GWAS) have identified numerous loci, they often ignore phased haplotype information. Identity-by-descent (IBD) mapping captures these extended haplotypic effects by modeling shared ancestral segments. However, standard statistical mapping of these segments scales poorly with biobank-sized cohorts and short IBD segments that capture older evolutionary events. To overcome this computational bottleneck, existing scalable IBD mapping frameworks aggregate shared segments into fixed sliding windows. While computationally efficient, this window-based approach generates association signals at a low resolution that often span hundreds of kilobases. To address this issue, here we present a novel High-resolution Fast IBD Mapping test (HiFiMAP) that takes snapshots of IBD segments at the single nucleotide polymorphism (SNP) level resolution. Simulation studies confirm that HiFiMAP maintains well-controlled type I error rates and exhibits superior statistical power for detecting rare variants and haplotype effects using short IBD segments. In a UK Biobank (UKB) benchmark (N=407,681), HiFiMAP mapped 640,899 SNPs at 1.92 CPU seconds per test, massively outperforming existing window-based methods (95.2 CPU seconds per test for 3,403 windows). Furthermore, applied to high-dimensional brain imaging phenotypes (N~36,000), HiFiMAP identified five novel associations previously undetected by standard GWAS approaches, including key central nervous system regulators like NR2F1 and NSF/WNT3. By refining large testing windows into highly specific genomic variants, HiFiMAP empowers biobank-scale, SNP-level resolution mapping to accurately pinpoint complex trait architectures.

16

Conditional and marginal SNP-heritability to leverage ancestral and environmental diversity

Singh Sachan, A. N.; Schwartzman, A.; Azriel, D.

2026-05-29 genetics 10.64898/2026.05.28.728536 medRxiv

Top 0.3%

1.6%

Show abstract

SNP-heritability is defined as the fraction of variance of a trait that is explained by the SNPs in a genome-wide association study. Several methodologies have been proposed to estimate this quantity. More recent methods aim to do so with ancestrally diverse datasets and yet obtain a single heritability for an entire dataset, which we refer to as marginal heritability. However, the different underlying subpopulations that compose a genetically diverse dataset might have different environmental and genetic exposures, and thus may have different heritabilities. In order to address this, we propose a conditional SNP-heritability approach that allows to estimate multiple SNP-heritabilities on a dataset corresponding to different ancestral compositions and environmental exposures. We take a careful statistical approach, including estimation of conditional genetic and environmental variances, and calculation of standard errors via a combination of the delta method with bootstrapping. We validate our method via extensive simulations. We then apply it to an ancestrally and socio-economically diverse dataset of 6603 subjects aged around 9 to 11 from the Adolescent Brain Cognitive Development study, and illustrate how the SNP-heritability of intelligence scores can change due to differing extrinsic variances in different socio-economic groups, which coincides with previous work in the literature. This conditional estimation approach can be a valuable tool for understanding differences in risks across subpopulations. Our work here improves on existing methodology and allows us to leverage the heterogeneity of the data to obtain new insights.

17

Elevated conformational dynamics makes ACKR3 activation-prone and G protein-incompetent

Wang, K.; Ngo, T.; Khare, E.; Chitsazi, R.; Roy, S.; Schafer, C. T.; Handel, T. M.; Kufareva, I.

2026-05-20 pharmacology and toxicology 10.64898/2026.05.17.725760 medRxiv

Top 0.3%

1.5%

Show abstract

The atypical receptor ACKR3 works together with the canonical chemokine receptor CXCR4 to drive cell migration along gradients of their shared agonist CXCL12. CXCR4 promotes chemotaxis by activating canonical G protein pathways and recruiting {beta}-arrestins. ACKR3 indirectly regulates CXCR4-mediated chemotaxis by scavenging CXCL12. Unlike canonical chemokine receptors, ACKR3 does not couple to G proteins and instead is 100% biased towards {beta}-arrestins. CXCR4 activation by CXCL12 is exquisitely sensitive to subtle changes in both receptor and ligand. By contrast, ACKR3 is activation-prone: it recruits {beta}-arrestins in response to many ligands and is much less sensitive to mutations, suggesting distinct activation mechanisms compared to CXCR4. To explore the basis of these differences, we compared the dynamics of ACKR3 and CXCR4 complexes with chemokines using molecular dynamic (MD) simulations. Ten-microsecond atomistic MD simulations revealed that CXCR4 adopts a stable active state when bound to WT CXCL12 but transitions to an inactive state when in complex with the antagonist variant, [P2G]CXCL12. By comparison, ACKR3 exhibits a variable transmembrane (TM) 6 state distribution and persistently "active" TM7 when complexed with either WT CXCL12 or [P2G]CXCL12, the latter retaining substantial agonistic activity at ACKR3. We further identified ligand-mediated residue interaction networks in the TM core that regulate TM6 and TM7 activation in CXCR4 but are absent or disrupted in ACKR3, resulting in less constrained receptor dynamics. These findings were validated by BRET-based assays with CXCL12 and ACKR3 mutants. Together, the data suggests that the unique conformational dynamics of ACKR3 govern its activation propensity, its ligand promiscuity, and its atypical effector coupling.

18

Gonadal sex and sex chromosomes each contribute to sexually dimorphic gene expression in threespine stickleback

Treaster, M.; White, M. A.

2026-05-14 genomics 10.64898/2026.05.12.724688 medRxiv

Top 0.4%

1.5%

Show abstract

Many taxa have evolved heteromorphic sex chromosomes like the XY system found in mammals. In additional to the sex determination gene which directs development of the gonad into an ovary or testis, sex chromosomes can have drastically different gene content, leading to substantial genetic differences between genetic males and females beyond their gonad identity. Studying the effects of these genetic differences is challenging, as the sex chromosomes and sex determination gene are inherited together, so the effects of genetic differences between the X and Y cannot be easily isolated from the hormonal differences produced by the ovary and testis. The threespine stickleback fish has a heteromorphic XY sex chromosome system and a wide range of well documented sex differences in morphology and behaviors, including complex mating behaviors and male-only parental care. Through genetic manipulation of amhy, the newly identified sex determination gene in threespine stickleback, we are able to generate gonadal males and females with either the XX or XY sex chromosome complement and analyze the separate effects of gonadal sex and sex chromosome complement on sexually dimorphic gene expression. We find that sex chromosomes have a larger effect on gene expression than gonadal sex in somatic tissues, while gonadal sex has a larger effect on expression in the gonads. We also find that the X and Y chromosomes are enriched for genes which show differential expression between females and males. Our findings demonstrate the significant biological impact of sex chromosomes outside of primary sex determination and showcase the utility of the threespine stickleback for studying the genetic basis of sex differences.

19

Directional Gene-Level Concordance and Methodological Constraints in Blood Transcriptomic and DNA Methylation Studies of Parkinson's Disease

Kaur, R.; Dewan, C.; Chauhan, I.; Sharma, K.; Sharma, S.

2026-05-20 neuroscience 10.64898/2026.05.17.725808 medRxiv

Top 0.4%

1.4%

Show abstract

Assessing reproducibility across different molecular profiling studies is a persistent methodological challenge (Zhang et al., 2009; Sweeney et al., 2017; Ioannidis, 2005). Differences in platform technology, cohort composition, analytical pipelines, and feature definitions often make it difficult to interpret cross-study comparisons based solely on gene-identity overlap. In this study, we conducted a retrospective computational analysis of seven publicly available analytical datasets (including alternative analytical pipelines applied to the same cohort) derived from five biologically independent peripheral blood transcriptomic and DNA methylation cohorts, comprising 3,487 samples (1,824 Parkinsons disease cases and 1,663 controls). Reproducibility was evaluated using gene-identity overlap, enrichment-based comparisons, and a permutation-based framework to assess directional consistency of effect estimates across datasets. We also tested the robustness of results by varying false discovery rate thresholds and applying alternative probe-to-gene collapsing strategies. All analyses were performed using reproducible workflows implemented in R and Python with fixed random seeds. Across independent cohorts, gene-identity overlap was generally limited, with enrichment ratios close to one, especially when datasets were generated using different platforms. In several datasets, limited numbers of statistically significant features further constrained overlap-based comparisons. In contrast, directional consistency showed greater stability. High levels of directional consistency were observed across independent cohort comparisons when restricted to overlapping statistically significant features and remained stable across statistical thresholds (90.0% at FDR < 0.05 and 82.8% at FDR < 0.10). When evaluated across the full shared gene universe without conditioning on statistical significance, directional consistency was substantially lower ([~]30 to 32%) but remained significantly above permutation-based null expectations. Permutation testing confirmed that the observed directional consistency exceeded what would be expected by chance. A combined analysis including methodological replicates (n [≥] 3 datasets) showed 98.3% directional consistency; however, this estimate includes non-independent analytical pipelines applied to the same cohort and reflects analytical stability rather than independent biological replication. Rather than introducing a new statistical method, this study examines how commonly used reproducibility metrics behave under crossstudy heterogeneity and identifies their practical limitations and appropriate use boundaries.

20

The 16p11.2 microdeletion enhances gene expression variability between human IPSC derived forebrain interneuron progenitor cells in culture.

Yang, Y.; Quintana-Urzainqui, I.; Pratt, T.

2026-05-24 genetic and genomic medicine 10.64898/2026.05.21.26353723 medRxiv

Top 0.4%

1.3%

Show abstract

The 574 kilobase pair 16p11.2 microdeletion raises a person's odds for neurodevelopmental and energy balance conditions, particularly autism and obesity. There is considerable clinical heterogeneity and how much this reflects genetic versus environmental or stochastic factors is unclear. Forebrain interneurons originate from progenitors residing in the ventricular zone of the foetal ventral telencephalon and their perturbation is implicated in a number of 16p11.2 phenotypes prompting investigation of how the 16p11.2 microdeletion impacts their development. We differentiate human induced pluripotent stem cells (IPSCs), isogenic except for heterozygous 16p11.2 microdeletion to minimise confounding effects of genetic background, to ventral telencephalic interneuron progenitor fate in 2D culture and use single cell RNA sequencing to obtain single cell transcriptome populations for comparative bioinformatics. Hundreds of transcripts are differentially expressed and many associate with cell signalling, chromatin, neurodevelopmental conditions including autism, and obesity. Pertinently, we find that transcript level variation is significantly greater in 16p11.2 heterozygous progenitors than their isogenic wild type counterparts and this holds for sets of genes comprising regulons, gene-sets functionally connected by transcription factor regulation, and for randomly selected gene-sets indicating that the 16p11.2 locus itself has a genome-wide property in stabilising transcription between cells. Regulons with greatest increased variation in 16p11.2 heterozygous progenitors exhibit strong enrichment for cell cycle related genes, resonating with our earlier finding of increased cell cycle variability between 16p11.2 heterozygous organoids, and many are regulated by transcription factors associated with autism and/or obesity enforcing the idea that unusual transcriptional variation itself contributes to phenotypes.